perm filename COMP.WRU[206,JMC] blob sn#005337 filedate 1971-01-05 generic text, type T, neo UTF8
00100	                     COMPUTER SCIENCE DEPARTMENT
00200	                         STANFORD UNIVERSITY
00300	
00400	
00500	CS 206           COMPUTING WITH SYMBOLIC EXPRESSIONS        FALL 1970
00600	
00700	
00800	        ADDITIONAL INFORMATION ABOUT THE THE COMPILER PROBLEM

00900	
01000	1. Some facts about the PDP-10.
01100	
01200		The following facts about the PDP-10 may be of use in writing
01300	the limited LISP compiler called for in the exercise:
01400	
01500		The  PDP-10  has  a  36  bit  word and an 18 bit address.  In
01600	instructions and in accumulators used as index registers this is  the
01700	right part of the word where the least significant bits in arithmetic
01800	reside.
01900	
02000		There are 16 general registers which serve simultaneously  as
02100	accumulators  (receiving the results of arithmetic operations), index
02200	registers (modifying the nominal addresses of  instructions  to  form
02300	effective addresses), and as the first 16 registers of memory (if the
02400	effective address of  an  instruction  is  less  than  16,  then  the
02500	instruction uses the corresponding general register as its operand.
02600	
02700		All instructions have the same format and are written for the
02800	LAP assembly program in the form
02900	
03000		(<op name> <accumulator> <address> <index register>).
03100	
03200	Thus %(MOVE 1 3 P)% causes accumulator %1% to receive the contents of
03300	a  memory  register whose address is %3+c(P), i.e. 3+<the contents of
03400	general register P.  If the op name is followed by %@,then the memory
03500	reference is indirect, i.e. the effective addrress is the contents of
03600	the right half of the word  directly  addressed  by  the  instruction
03700	(modified  by  the index register and indirect bit of that word).  In
03800	the following description of some instructions useful in constructing
03900	the compiler, <ef> denotes the effective address of an instruction.
04100	
04200	MOVE			c(ac) ← c(<ef>)
04300	MOVEI			c(ac) ← <ef>
04400	MOVEM			c(<ef>) ← c(ac)
04500	HLRZ			c(left half ac) ← right half of c(<ef>)
04600	HRRZ			c(right half ac) ←c(right half of c(<ef>)
04700		(These two are used indirectly for %car% and %cdr% respectively.)
04800	ADD			c(ac) ← c(ac) + c(<ef>)
04900	SUB			c(ac) ← c(ac) - c(<ef>)
05000	JRST			go to <ef>
05100	JUMPE			if c(ac) = 0 then go to <ef>
05200	JUMPN			if c(ac) ≠ 0 then go to <ef>
05300	CAME			if c(ac) = c(<ef>) then <skip next
05400	CAMN			if c(ac) ≠ 0 then <skip next instruction>
05500	PUSH			c(c(right half of ac)) ← c(<ef)); the contents
05600				of each half of ac is increased by one and if
05700				the contents of the left half is then %0, a stack
05800				overflow interrupt occurs.  (PUSH P ac) is
05900				is used in LISP to put c(ac) on the 
06000				stack.
06100	POP			c(<ef>) ←c(c(right half of ac)); the
06200				contents of each half of ac are then decreased
06300				by %1.  This may be used for removing the
06400				top element of the stack.  Thus (POP P 1) puts
06500				the top element of the stack in accumulator 1.
06600				The i-th element of the stack is obtained by
06700				(MOVE@ 1 i P).
06800	POPJ			(POPJ P) is used for returning from a subroutine
06900	
07000		These instructions are adequate for compiling basic LISP code
07100	with  the addition of the subroutine calling pseudo-instrucion. (CALL
07200	n (E <subr)) is used for calling the LISP subroutine <subr> with  %n%
07300	arguments.   The  convention  is that the arguments will be stored in
07400	successive accumulators beginning with accumulator %1, and the result
07500	will  be  reurned  in  accumulator  %1.   In particular the functions
07600	%ATOM% and %CONS% are called with %(CALL 1 (E ATOM))% and (CALL 2  (E
07700	CONS))  respectively,  and  programs produced by you compiler will be
07800	called similarly when their names are referred to.
08000	
08050	2. Code produced by LISP compilers.
08075	
08100		I have written two compilers, the  simple  one  called  LCOM1
08200	which  I  have distributed, and more optimising compiler called LCOM4
08300	which I am still improving. Currently, LCOM4 produces about  half  as
08400	many  instructions for a given function as does LCOM1. Besides these,
08500	there is the standard PDP-10  LISP  compiler  written  at  M.I.T.  by
08600	Richard  Greenblatt  and  Stuart  Nelson  and  modified  by Whitfield
08700	Diffie.
08500	
08600		Consider the function %union% for giving  the  union  of  two
08700	unordered lists.  We have
08900	
09000		union(u,v) ← if n u then v else if a u ε v then
09100			union(d u,v) else a u . union(d u,v).
09200	
09300	Its LISP definition is
09400	
09500	(DE UNION (U V) (COND ((NULL U) V) ((MEMBER (CAR U) V)
09600		(UNION (CDR U) V)) (T (CONS (CAR U) (UNION (CDR U) V)))))
09700	
09800	Here are the programs produced by the three compilers.
09900	
10000	The basic compiler LCOM1 generates the following 46
10050	instructions:
10100	(LAP UNION SUBR) 
10200	(PUSH P 1) 
10300	(PUSH P 2) 
10400	(MOVE 1 -1 P) 
10500	(JUMPN 1 G0157) 
10600	(MOVE 1 0 P) 
10700	(JRST G0156) 
10800	G0157 
10900	(MOVE 1 -1 P) 
11000	(HLRZ@ 1 1) 
11100	(PUSH P 1) 
11200	(MOVE 1 -1 P) 
11300	(PUSH P 1) 
11400	(SUB P (C 0 0 2 2)) 
11500	(MOVE 2 2 P) 
11600	(MOVE 1 1 P) 
11700	(CALL 2 (E MEMBER)) 
11800	(JUMPE 1 G0158) 
11900	(MOVE 1 -1 P) 
12000	(HRRZ@ 1 1) 
12100	(PUSH P 1) 
12200	(MOVE 1 -1 P) 
12300	(PUSH P 1) 
12400	(SUB P (C 0 0 2 2)) 
12500	(MOVE 2 2 P) 
12600	(MOVE 1 1 P) 
12700	(CALL 2 (E UNION)) 
12800	(JRST G0156) 
12900	G0158 
13000	(MOVE 1 -1 P) 
13100	(HLRZ@ 1 1) 
13200	(PUSH P 1) 
13300	(MOVE 1 -2 P) 
13400	(HRRZ@ 1 1) 
13500	(PUSH P 1) 
13600	(MOVE 1 -2 P) 
13700	(PUSH P 1) 
13800	(SUB P (C 0 0 2 2)) 
13900	(MOVE 2 2 P) 
14000	(MOVE 1 1 P) 
14100	(CALL 2 (E UNION)) 
14200	(PUSH P 1) 
14300	(SUB P (C 0 0 2 2)) 
14400	(MOVE 2 2 P) 
14500	(MOVE 1 1 P) 
14600	(CALL 2 (E CONS)) 
14700	(JRST G0156) 
14800	G0159 
14900	G0156 
15000	(SUB P (C 0 0 2 2)) 
15100	(POPJ P) 
15200	NIL 
15300	
15400		LCOM4 produces the following program:
15500	(LAP UNION SUBR) 
15600	(PUSH P 1) 
15700	(PUSH P 2) 
15800	(MOVE 1 -1 P) 
15900	(JUMPN 1 G0157) 
16000	(MOVE 1 0 P) 
16100	(JRST 0 G0156) 
16200	G0157 
16300	(HLRZ@ 1 -1 P) 
16400	(MOVE 2 0 P) 
16500	(CALL 2 (E MEMBER)) 
16600	(JUMPE 1 G0158) 
16700	(HRRZ@ 1 -1 P) 
16800	(MOVE 2 0 P) 
16900	(CALL 2 (E UNION)) 
17000	(JRST 0 G0156) 
17100	G0158 
17200	(HRRZ@ 1 -1 P) 
17300	(MOVE 2 0 P) 
17400	(CALL 2 (E UNION)) 
17500	(MOVE 2 1) 
17600	(HLRZ@ 1 -1 P) 
17700	(CALL 2 (E CONS)) 
17800	G0156 
17900	(SUB P (C 0 0 2 2)) 
18000	(POPJ P) 
18100	NIL 
18200	
18300		The  standard  PDP-10  LISP  compiler  produces the following
18400	code:
18600	
18700	(LAP UNION SUBR)  
18800		(PUSH P 1)  
18900		(PUSH P 2)  
19000		(JUMPN 1 G0002)  
19100		(MOVE 1 2)  
19200		(JRST 0 G0001)  
19300	G0002 	(HLRZ@ 1 1)  
19400		(CALL 2 (E MEMBER))  
19500		(JUMPE 1 G0003)  
19600		(MOVE 2 0 P)  
19700		(HRRZ@ 1 -1 P)  
19800		(CALL 2 (E UNION))  
19900		(JRST 0 G0001)  
20000	G0003 	(HLRZ@ 1 -1 P)  
20100		(MOVE 2 0 P)  
20200		(PUSH P 1)  
20300		(HRRZ@ 1 -2 P)  
20400		(CALL 2 (E UNION))  
20500		(POP P 2)  
20600		(CALL 2 (E XCONS))  
20700	G0008  
20800	G0001 	(SUB P (C 0 0 2 2))  
20900		(POPJ P)  
21000		NIL  
21100	 
21200	
21300		Note  that all three compilers produce the same first line of
21400	heading.  This is necessary for the LAP assembly program  which  also
21500	requires  the %NIL% at the end as punctuation.  The standard compiler
21600	uses a function called %XCONS% that has  the  same  effect  as  CONS%
21700	except  that  it receives its arguments in the reverse order which is
21800	sometimes convenient.  This requires, of course, that the compiler be
21900	smart  enough  to  decide  where  it  is  more  convenient to put the
22000	arguments of the %cons% function.  You may use %XCONS% if  you  like,
22100	but you may have better ways of using your time.
22200	
22300		My  two  compilers  are  similar in structure, but the better
22400	one, which is twice as long, uses the following optimisations.
22500	
22600		1. When the argument of CAR or CDR is a variable, it compiles
22700	a  (HLRZ@  1  i P) or (HRRZ@ 1 i P) which gets the result through the
22800	stack without first compiling the argument into an accumulator.
22900	
23000		2. When it has to set up the arguments of a function  in  the
23100	accumulators,  in general, the program must compute the arguments one
23200	at a time and save them on the stack, and then load the  accumulators
23300	from the stack.  However, if one of the arguments is a variable, is a
23400	quoted expression, or can be obtained from a variable by a  chain  of
23500	%cars%  and  %cdrs,  then  it  need not be computed until the time of
23600	loading  accumulators  since  it  can  be  computed  using  only  the
23700	accumulator in which it is wanted.
23800	
23900		3.  LCOM1  computes  Boolean  expressions badly and generates
24000	many unnecessary labels and JRSTs.  LCOM4 is more sophisticated about
24100	this.
24200	
24300		4.  The  Diffy  compiler  produces about 10 percent less code
24400	than LCOM4; the main difference seems to be that it sometimes notices
24500	when  it  has  an  expression that it will need later. Sometimes this
24600	feature leads it astray as in the above example wherein  saves  %a%u%
24700	for later use at greater cost than re-computing it.
24800	
24900		It  should  further be noted that quoted S-expressions can be
25000	compiled with the instruction
25100	
25200		(MOVEI 1 (QUOTE α)).
25300	
25400	As  someone  pointed  out  in  class,  the  compiler  as  distributed
25500	incorrectly used MOVE there.
25600	
25700		You   are   welcome   to   pirate   anything  you  want  from
25800	LCOM1[206,JMC].  I particularly recomment the function  COMPL  itself
25900	that  handles  all  the  I-O  in  such  a  way  as  to produce a file
26000	acceptable to LAP.
26100	
26200	3. Running the compilers.
26300	
26400		If you want to run one of these compilers, say LCOM4, to  see
26500	what sort of code it produces, here is the procedure:
26600	
26700		1. Prepare a file <file name> without an extension containing
26800	the functions to be compiled in  LISP  S-expression  form,  i.e.  (DE
26900	<function name> <variable list> <function body>).
27000	
27100		2. Give the system command RU LCOM4[206,JMC]<cr>.  LCOM4 will
27200	reply with *.
27300	
27400		3. Give the RLISP command COMPL <file name>;<cr>.   COMPL  is
27500	an  FEXPR  so  that  <file  name>  does not have to be preceded by '.
27600	LCOM4 will respond with a list of  the  functions  compiled  and  the
27700	lengths  of  the  compiled  programs  exaggerating slightly since all
27800	expressions in the output list are counted. <control C> will get  you
27900	back to the system and a file called <file name>.LAP has the compiled
28000	programs.
28100	
28200		4. You can look at the program by TYPE  <file  name>.LAP<cr>,
28300	and  if  you  want  to  try out the functions you can enter LISP by R
28400	LISP<cr> and after answering N to the ALLOC? question, you  can  read
28500	in  the  functions  by (DSKIN (<file name>.LAP))<cr>, and they can be
28600	tested in the usual LISP way.